
TM^T WORLD INTELLECTU AL PROPERTY ORG ANIZ ATI ON 

rLl International Bureau 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 
G06F 13/00, 13/14 



Al 



(11) International Publication Number: WO 99739272 

(43) International Publication Date: 5 August 1999 (05.08.99) 



(21) International Application Number: PCT/US99/02063 

(22) International Filing Date: 29 January 1999 (29.01.99) 



(30) Priority Data: 
60/073,203 



30 January 1998 (30.01.98) US 



(71) Applicant: THE TRUSTEES OF COLUMBIA UNIVERSITY 
IN THE CITY OF NEW YORK [US/US]; Broadway and 
1 16th Street, New York, NY 10027-6699 (US). 

f72) Inventors: KALVA, Hari; Apartment 8E, 419 West 119th 
Street, New York, NY 10027 (US). ELEFEHERIAD1S, 
Alexandras; Apartment 42, 560 Riverside Drive, New York, 
NY 10027 (US). • 

(74) Agents: TANG, Henry et al.; Baker & Botts, LLP, 30 
Rockefeller Plaza, New York, NY 101 12-0228 (US). 



(81) Designated States: AL, AM, AT, AU, AZ^BA BB, BG, BR, 
BY CA CH CN, CU, CZ, DE, DK, EE, ES, FI, GB, GD, 
GE GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, 
KR KZ, LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, 
MN MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, 
SlIjK SL, TJ, TM, TR, TT, UA, UG, UZ, VN, YU, ZW, 
ARIPO patent (GH, GM, KE, LS, MW, SD, SZ, UG, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, CH, CY, DE, DK. ES, FI, FR, 
GB GR IE IT, LU, MC, NL, PT, SE), OAPI patent (BF, 
BJ.'CF/CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, 
TD, TG). 



Published 

With international search report. 



(54) Title: METHOD AND SYSTEM FOR CLIENT-SERVER INTERACTION IN INTERACTIVE COMMUNICATIONS 



(57) Abstract 

In an interactive communication system based 
on MPEG-4, Command descriptors along with Com- 
mand Route nodes or Server Routes in the scene de- 
scription can be used to support application-specific 
interactivity. Content selection can be supported by 
specifying the presentation in command parameters, 
with the command ID indicating that the command 
is a content selection command. An initial scene can 
be created with several images and with text that de- 
scribes a presentation associated with an image. As- 
sociated with each image and the corresponding text 
is a content selection descriptor. When a user clicks 
on an image, the client transmits the command con- 
taining the selected presentation and the server starts 
a new presentation. The technique can be use d in 
any application context, as generally as HTTP and 
CGI can be used to implement any server-based ap- 
plication functionality. 
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METHOD AND SYSTEM FOR CLIENT-SERVER INTERACTION 
IN INTERACTIVE COMMUNICATIONS 

Technical Field 

The present invention relates to techniques for performing client-server 
interaction in communication systems and, more particularly, in communication 
systems based on the MPEG-4 standard. 

5 background of the Invention 

Interactivity is a prominent concern in the development of the MPEG-4 
international standard (ISO/IEC 14496 Parts 1-6, Committee Draft, October 31, 1997, 
Fribourg, Switzerland). A back channel is specified for interactive message support. 
However; the syntax and semantics of the messages to be carried, through that channel 
1 0 remain unspecified, and so does the mechanism that triggers the transmission of such 
messages. Existing standards such as DSM-CC (ISO/IEC International Standard 
1381 8-6) and RTSP (RFC 2326) support traditional VCR-type interactivity to 
reposition a media stream during playback, but this is inadequate for MPEG-4 
applications which require more complex interactive control. 
1 5 An interactive message can be generated by a certain user action or system 

event. It will then be sent to the server which in turn may modify the stream(s) it is 
delivering by adding or removing objects, or switching to an entirely new scene. User 
actions may include clicking on an object, input of a text stream, etc. System events 
include timers, conditional tests, etc. 
20 Interactivity is application-specific, and one cannot define interactive behavior 

completely in terms of user events. To support application-specific interactivity, a 
CGI-like approach should be adopted. Specific user events cause application-specific 
command data to be sent back to the server. The server can then respond, typically by 
sending a scene description update command. This allows complete freedom for 
25 supporting full interactivity as may be required by applications. 

MPEG-4 essentially uses two modes of interactivity: local and remote. Local 
interactivity can be fully implemented using the native event architecture of MPEG-4 
BIFS (Binary Format for Scenes), which is based on the VRML 2.0 ROUTEs design 
(see www.vrml.org and "The VRML Handbook", J. Hartman and J. Wernecke, 
30 Addison-Wesley, 1 996) and documented in Part 1 of the MPEG-4 specification 

(Systems). If the MPEG-4 receiver is hosted in another application, events that need 
to be communicated to the MPEG-4 receiver by the application can be translated to 
BIFS update commands, as defined in Part 1 of MPEG-4. 

Remote interactivity currently consists of URLs. As defined in the MPEG-4 
35 Systems Committee Draft, these can only be used to obtain access to content. As a 
Tesult, they cannot be used to trigger commands. 



WO 99/39272 



PCT/US99/02063 



The fact that MPEG-4 Systems already contains local interactive support via 
the use of event source/sink routes that are part of the scene description (BIFS) makes 
it desirable to have a server interaction process that fully integrates with the local 
interactivity model. 

Summary of the Invention 

An objective of the present invention is to provide a technique for 
communicating messages between two entities such as "client" and "server", utilizing 
the MPEG-4 international standard. 

A second obj ective of the present invention is to provide a technique for 
allowing the user or the system to generate such messages in the context of an 
MPEG-4 player or client. 

A third objective is to provide a technique for generating such messages 
consistent with the local interactivity model defined in MPEG-4, which is based on 
the VRML 2.0 specification. 

A further objective is to provide a technique for encoding such messages 
within an MPEG-4 bitstream, as well as to link the encoded messages to the scene 
description. 

• Still a further objective is to provide a technique for encoding such messages 
in a way that allows a server to easily modify them before sending them for use by the 
client. This is important for interactive applications. An example is "cookie" 
management where a server.must.be able to quickly update the content of the 
command with a codeword that stores state information about the user's activities on 
the particular site. 

In order to meet these and other objectives which will become apparent with 
reference to further disclosure set forth below, the present invention broadly provides 
a technique for incorporating server commands into MPEG-4 clients. The technique 
involves the use of command descriptors, i.e. a special form of descriptors that are 
transmitted together with the scene description information and contain the command 
to be sent back to a server upon triggering of an associated event. The desired event 
sources in the scene description are associated with these command descriptors. 

In one embodiment, the association is performed using server routes. These 
operate similarly to traditional MPEG-4 BIFS routes, but instead of linking a source 
field with a sink field they link a source field with a sink command descriptor. Server 
routes require an extension of the MPEG-4 BIFS ROUTE syntax. 

In another embodiment, the association is performed using command nodes. 
Such nodes contain sink fields, and are associated with command descriptors. This 
technique involves the addition of one more node type to the set of MPEG-4 BIFS 
nodes. 

In both cases, the normal interaction model defined by MPEG-4 can be used 
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Fig. 12 is a flow diagram illustrating the process of triggering a server 
command in an MPEG-4 client, when Server Routes are used. 

Fig. 13 is a flow diagram illustrating the process of assembling the data to be 
placed in the command sent back to the server, when Server Routes are used. 
5 Fig. 14 is a flow diagram illustrating the process of triggering a server 

command in an MPEG-4 client, when Command Route nodes are used. 

Fig. 15 is a flow diagram illustrating the process of assembling the data to be 
placed in the command sent back to the server, when Command Route nodes are 
used. 

10 

Pefcite^ Description 

Reference will now be made in detail to the preferred embodiment of the 
invention as illustrated in the figures. 

MPEG-4 is an international standard being developed under the auspices of 

15 the International Standardization Organization (ISO). Its official designation is 
ISO/EEC 14496. Its basic difference with previous ISO or ITU standards such as 
MPEG-1, MPEG-2, H. 261, or H. 263 is that is addresses there presentation of 
audiovisual objects. Thus, the different elements comprising an audiovisual scene are 
first encoded separately, using techniques that are being defined in Parts 2 (Visual) 

20 and 3 (Audio) of the specification. These objects are transmitted to the receiver or 
read from a mass storage device together with scene description information that 
describes how these objects pre tabe placed in space and time in order to be presented 
to the user. 

The coded data for each audiovisual object as well as the scene description 

25 information proper are transmitted in their own "channels" or elementary streams. 
Additional control information is also transmitted, as further discussed below, in 
order to allow the receiver to correctly associate audio visual objects referenced in the 
scene with the elementary streams that contain their encoded data. 

In order to fully describe the structure of MPEG-4, we refer to Fig. 1. At the 

30 bottom of the figure, the various possible delivery systems are shown, including (but 
not limited to) ATM, IP, MPEG-2 Transport Stream (TS), DVB, either over a 
communication link or a mass storage device. In contrast to MPEG-2, MPEG-4 does 
not define its own transport layer facility, in order to allow delivery over a wide 
variety of communication environments. For delivery systems that may lack 

35 appropriate multiplexing capability, e.g. GSM wireless data channels, or that require 
low delay, a simple multiplexing tool called FlexMux is defined. 

This infrastructure is used to deliver to the client a set of elementary streams. 
The streams contain any one of scene description information, audio visual object 
data (e.g. coded video, such as an MPEG-2 or MPEG-4 video stream), or control 

40 information (namely, object descriptors). Each elementary stream can contain data of 
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for both local interactivity, i.e. events generated and processed on the local clients, as 
well as server interactivity', i.e. as events generated on the client generate commands 
that are sent back to the server. Upon triggering of an event associated with a 
command descriptor, either via a server route or a regular route to a command node, 
5 the client obtains the command information stored in the command descriptor, 
packages it into a command message, preferably using the syntax provided in the 
preferred embodiment, and transmits it back to the server using the appropriate back 
channel. 

The data to be carried by the. generated command back to the server are 
10 contained in the command descriptor. Since command descriptors are part of the 
overall descriptor framework of MPEG4, they can be dynamically updated, using 
time stamped object descriptor updates. This provides considerable flexibility in 
customizing commands, for example to perform "cookie 11 management. 

To further aid the server in processing the generated command, additional 
15 information such as the time the event was generated, the source node, etc., are also 
contained in the client's message. 

Brief Description nf the Drawing 

Fig. 1 is a diagram illustrating the overall structure of an MPEG-4 client or 
terminal. 

20 Fig. 2 shows the MPEG-4 System Decoder Model. 

Fig. 3 illustrates the method" used in MPEG-4 for .associating audio visual 
objects with their encoded data in other streams via object descriptors and elementary 
stream descriptors. 

Fig. 4 shows a generic configuration of a client/server MPEG-4-based 
25 communication system. 

Fig. 5 illustrates the method of associating scene description nodes, especially 
sensor nodes, to command descriptors using: (a) server routes, and (b) command route 
nodes. 

Fig. 6 shows the binary syntax of the command descriptor as described in a 
30 preferred embodiment. 

Fig. 7 shows the binary syntax of the command descriptor remove command 
as described in the preferred embodiment. 

Fig. 8 shows the binary syntax of the server route structure, and how it is 
added to the main MPEG-4 BIFS scene syntax. 
35 Fig. 9 shows the node syntax and semantics of the Command Route structure. 

Fig. 10 shows an indicative list of predefined Command IDs and their 
associated interpretation. 

Fig. 1 1 shows the binary syntax of the command contained in a command 
descriptor for the preferred embodiment. 

-3- 
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VRML is a purely 3-D scene description language. This addresses the needs of 
applications that focus on low-cost systems. In contrast to VRML, MPEG-4 does not 
use a text-based scene description but instead defines a bandwidth-efficient 
compressed representation called BIFS (Binary Format for Scenes). 
5 The BIFS encoding follows closely the textual specification of VRML scenes. 

In particular, node coding is performed in a depth-first fashion, similarly to text-based 
VRML files. As in VRML, the fields of each type of node assume default values. 
Hence only fields that have non-default values need to be specified within each node. 
Field coding within a node is performed using a simple index-based method followed 

10 by the value of the coded field. Node type coding is more complicated. In order to 
further increase band width efficiency the context of the parent field (if any) is taken 
into account.. Each fieldThat accepts children nodes is associated with a particular 
node data type. Nodes are thfcn encoded using an index which is particular to this 
node data type, in known fashion. 

1 5 Each coded node can also be assigned a node identifier (an integer, typically). 

This allows the reuse of that node in other places in the scene. This is identical to the 
USE/DEF mechanism of VRML. More important, however, is the fact that it allows 
it to participate in the interaction process. 

• The interaction model used in MPEG-4 is the same as in VRML. In 

20 particular, fields of a node can act as event sources, event sinks, or both. An event 
source is associated with a particular user action or system event. Example of user 
event are sensor nodes that can detect when the mouse has been clicked. Example of 
system events are timers (TimeSensor node) that are triggered according to the 
system's time. 

25 Dynamic scene behavior and interactivity are effected by linking event source 

fields to event sink fields. The actual linking is performed using the mechanism of 
ROUTEs. MPEG-4 route specifications, if any, are given immediately after the scene 
node descriptions. Their encoding is based on the node identifiers for the source and 
sink nodes, as well as the field indices of the source and sink fields. 

30 An important distinction between VRML and MPEG-4 is that in the latter, 

scene descriptions can be updated dynamically using time-stamped commands. In 
Contrast, VRML operates on static "worlds". After a world is loaded, there is no 
mechanism to modify it. In MPEG-4, objects can be added or deleted, and parts of 
the scene (or the entire scene) can be replaced. 

35 OBJECT DESCRIPTORS 

In order to have a very flexible structure (facilitating editing, etc.), the actual 
content of audiovisual objects is not contained within the scene description itself. In 
other words, BIFS only provides the information for the scene structure, as well as 
objects that are purely synthetic, e.g. a red rectangle that is constructed using 
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only one type. 

The data contained in an elementary stream are packaged according to the 
MPEG-4 Sync Layer (SL), which packages access units of the underlying medium 
(e.g., a frame of video or audio, or a scene description command) and adds timing 
5 information (clock references and time stamps), sequence numbers, etc. The SL is 
shown at the middle portion of Fig. 1 . 

Encoding of individual audiovisual object data is performed according to Parts 
2 and 3 of the MPEG-4 specification. It is furthermore allowed to utilize other 
encodings, such as MPEG-1 or MPJBG-2. The scene description as well as the control 
1 0 information (object descriptors) is encoded as defined in Part 1 of MPEG-4. 

The receiver processes the scene description information as well as the 
decoded audiovisual object data and performs composition, i.e. the process of 
combining the objects together in a single unit, andrendering, i.e. the process of 
displaying the result in the user's monitor or playing it back in the user's speaker is in 
1 5 the case of audio. This is shown at the top of Fig. 1 . 

Depending on the information contained in the scene description, the user may 
have the opportunity to interact with the scene. In addition, the scene description may 
contain information that enables dynamic behavior. In other words, the scene itself 
may generate events, without user intervention. 
20 The object-based structure of MPEG-4 necessitated the definition of a more 

general system decoder model compared with MPEG-2 or other systems. In 
particular, as shown in Fig. 2, the receiver is considered, to be equipped with a set of 
decoders, one for each object. Each decoder has a decoding buffer, as well as a 
composition buffer. Decoding buffers are managed by the sender using techniques 
25 similar to MPEG-2, i.e., clock references for clock recovery, and decoding time 
stamps for removal of data from the decoding buffer followed by theoretically 
instantaneous decoding. Data placed in composition buffers are available for use by 
the compositor, and overwrite any previously placed data. The decoding buffers are 
filled by the demultiplexer, which is encapsulated within the DMIF (Digital Media 
30 Integration Framework, Part 6 of MPEG-4) Application Interface. This is a 
conceptual interface, requiring no further description here. 

l^mr,.* grTTNTP TYF.SfyRTPTION 
The scene description information in MPEG-4 is an extension of VRML 2.0 
(Virtual Reality Modeling Language) specification. VRML uses a tree structured 
3 5 approach to define scenes. Each node in the scene performs a composition and/or 

grouping operation, with the leaves containing the actual visual or audio information. 
Furthermore, nodes contain fields that affect their behavior. For example, -a 
Transform node contains a Rotation field to define the angle of rotation. 

MPEG-4 defines some additional nodes that address 2-D composition, as 
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server, or trigger the generation of such messages. 

Fig. 4 depicts an example client/server environment. On the left side of the 
figure there is an MPEG-4 server, including a pump that performs timed transmission 
of data read as SL-packetized streams, and an instance of a DMIF service provider, in 
5 this case utilizing the MPEG-4FlexMux multiplexing tool. 

The use of FlexMux is optional. Other server structures can be used, as 
known in the art. For example, the data source could be an MPEG-4 file instead of 
SL-packetized streams. 

The information generated at the server is sent across a network (e. g. IP or 
1 0 ATM) to the receiver. On the receiving side, we have a similar instance of a DMIF 
service provider which delivers demultiplexed elementary streams to the player. The 
DMTF-to-DMIF signaling is one method of performing session set up, and is. 
described in Part 6 of MPEG-4. "Other methods are possible, as known in the art, 
including Internet-based protocols such as SIP , SDP, RTSP, etc. 
1 5 One of the main obj ectives of the present invention is a process with which 

server commands can be first described and transmitted from the server to the client, 
then triggered at the client at the appropriate times, and finally sent back to the server 
in order to initiate further action. 

COMMAM>PESQOTTORS 
20 The Command Descriptor framework provides a means for associating 

commands with event sources within the nodes of a scene graph. When a user 
interacts with the scene, the associated event is triggered and the commands are 
subsequently processed and transmitted back to the server. 

The actual interaction itself is specified by the content creator and may be a 
25 mouse click or mouse-over or some other form of interaction (e.g., a system event). 

The command descriptor framework consists of three elements, a Command 
Descriptor, a server route or Command Route node, and a Command. 

A Command Descriptor contains an ID (an integer identifier) as well as the 
actual command that will eventually be transmitted back to the server if and when an 
30 associated event is triggered. 

The ID is used to refer to this command descriptor from the scene description 
information. By separating the command descriptor and the command it contains 
from the scene itself, we allow for more than one event to use the same command. 
This also permits modification of the command without changing the scene 
35 description in any way. 

The association of the Command Descriptors to the event source field node 
can be performed in different ways. 

First, a Server Route can be added to the regular route facility of BIFS. The 
difference with traditional routes is that the target of the route is not another field, but 
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BIFS/VRML nodes. Audio visual objects that require coded information are 
represented in the scene byleaf nodes which either point to a URL or an object 

deSOTP Object descriptors are hierarchically structured information that describe the 
5 elementary streams that comprise the coded representation of a single audio visual 
" object. More than one stream may be required, e.g. for stereo or multi-language 
audio, or hierarchically coded video. The topmost descriptor is cabled object 
descriptor, and isashell that is used to associate an object ^cnptor tdent^OD- 
ID) with a set of elementary stream descriptors. The latter contain an ES-TD as well 
l0 as information about the type of data contained in the elementary stream associated 
* with the particular ES-ID. This information tells the receiver, e.g, tba a stream 
contains MPEG-2 Video data, following the Mail Profile at the Mam Level. 

The mapping of the ES-TD to an actual elementary stream is performed using 
astreammap table. For example, it may associate ES-TD 10 with support number 
15 1025 This table is made available to the receiver during session set up. Theuseot 
J multiple levels of indirection facilitates manipulation of MPEG-4 content. For 

example, remultiplexing would only require a different stream map table. No other 
information would have to be modified within the MPEG-4 content. 

Object descriptors are transmitted in their own elementary streams, and are 
20 packaged in commands according to the Sync Layer syntax. This allows object 
descriptors to be updated, added, or removed. 

Fig 3 depicts the process with which object descriptors and elementary stream 
descriptors are used to associate audiovisual objects in the scene description with tiieir 
elementary streams. First, a special Initial Object Descriptor is used to bootetxap the 
25 MPEG-4 receiver by pointing to the object descriptor stream and the scene descnptor 
~~ stream associated with the selected content. This descriptor is delivered to the 
receiver during session set up. 

The scene description in this example contains an Audio Source node, which 
points to one of the object descriptors. The descriptor, in turn, contains - elementary 
30 stream descriptor that provides the ES-ID for the associated stream. The ES-ID is 
xeS0 lved to an actual transport channel using the stream map table. The scene also 
has a Movie Texture node that, in this case, uses scalable video with two streams. As 
a result, two elementary stream descriptors are contained, each pointing to the 
appropriate stream (base and enhancement layer). 

^pp,.a rr j^TZSERYEB MEEB ACTION 
From the preceding description, and considering the MPEG-4/VRML scene 
description framework, it is evident that while a rich local interaction framework is 
provided, there is no facility to effect server-based interaction. In particular, there is 
no mechanism with which to either describe messages that are to be sent back to a 
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15 n 



rather the command descriptor. This structure is depicted mFrg. 5(a). Intos 

debtor A sensor node triggers an event to another node with an event source 
£ST« tent source/sink, exposed Field in MPEG-4 tenninology, whtch .m turn 

tw are direct server routes from the nodes to the descriptor. 
* l"pproach consists of addmg anew Comm*rd «» » «* 
list of nodes supported by MPEG-4. This node has an 'execute' event sink field, as 
"l L a field claming the command descriptor ID. Whenever the execute field 
Z£ an event, using regular MPEG-4/VRML routes, the command tacnptor 
receive » ■ B ^ command back t0 the server. This structure 

"l"igW Spared w*h Fig. 5(a), the Server Routes are substituted 
l^ol Jd Route nodes. tL option ta to two cases is essentia,,, the same 
^Command Deaoripter syntax is shown in fig. 6. We use ; to Flavor meto 

oTthe MPEG-4 specification (see www.ee.columbia.edu/flavor or Part 1 of MPEG-4). 
un S parfic*«ype.We 4 enhavetodescnp te afi. n o^«dK>.The 
latter is used to signal predefined server commands, such as start pause or 

descriptor counted in bytes. Then, a count of the number of ES-IDsfhatwfil be 
"o transmit the message bach to die serve*). More than one . given n, 

communicated to multiple servers. This is followed by to s eri es of to tened ES- 
25 IDs Finmly.aserofapplicariou-speuificparmetersiuemeluded^Thesewillbe 

^edhai.toae^whentooonnnandistrigge^Dep^ontovalueof 

to command ID, to semantics of these parameters may be predefined 

easfiy generated on-the-fly by to server. This is an imporian. feature for applicants 
30 Z Lolve ''cookie" management, among otora, where to cnmmand parameters 
wTb. continuously updated by to server after pressing each user even,. 

to oXro updL a Command Descriptor, to server or content crentor only 

needs to submit a new one with the same ID. 

order to remove a Command Descrtpter, a special cormnan ts provided. 
35 ThesyntexisshowninFig.7. Theoontmandbegmswima ttE W.d^to 

»r aa a Command Descriptor Remove command. ,. ts ton followed by to ID 
of the Command Descriptor to be removed. «_ „ 

Both the Command Descriptor and Command Descriptor Remove s*nctoe 
can be earned within to object descriptor stream. Due to to structure of to object 
40 dLiptor framework, using tags to identify descriptors, these can be mterspersed 
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CLAIMS 

1. A method for communicating command information between a server and 
a client in an interactive communication system, comprising: 

generating a command message including a command, a command descriptor, 
and one of a server route and a command node; and 

transmitting the command message upon occurrence of a triggering event. 

2. The method in accordance with claim 1, wherein the interactive 
communication system is based on MPEG-4. 

3. The method inaccordarice with claim 2, wherein generating the command 
message is consistent with thS local interactivity model defined in MPEG-4. 

4. The method in accordance with claim 1, wherein the triggering event is a 
mouseclick. 

5. The methodin accordance with claim 1, wherein the triggering event is a 
timer signal. 

6. The method in accordance with claim 1, wherein command information is 
transmitted from the server to the, client 

7. The method in accordance with claim 1, wherein command information is 
transmitted from the client to the server. 

8. An interactive communication system comprising means for 
communicating command information between a server and a client, wherein the 
means for communicating command information comprises: 

means for generating a command message including a command, a command 
descriptor, and one of a server route and a command node; and 

means for transmitting the command message upon occurrence of a triggering 

event. 

9. The system in accordance with claim 8, based on MPEG-4. 

10. The system|in accordance with claim 9, wherein generating the command 
message is consistent with the local interactivity model defined in MPEG-4. 

1 1. The system in accordance with claim 8, wherein the triggering event is a 
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waS transmitted, as well as the set of parameters that were specified in the Command 
Descriptor These commands are transmitted in SL-padcetized streams, and hence 
full timing and sequencing information can be made available to the server. 

p EQ£ESS IH G EVEfflS EQJ DISH ft *™ ™ COMMANDS 

We now discuss m detail the process of generating commands based on user 
or system events, starting with the use of Server ROUTEs. 

Referring to Fig. 12, upon me generation of a user or system event, tire 

For the purposes of event propagation, the type of ROUTE does not matter, so that 
the sameTgorithm can be used. If an event is propagated through a Server Route, the 
system checks if that event corresponds to a condition associated wtth ^logical 
True value. If no, the server command processing terminates; yes, then the dispatch 

process is executed. 

This process, for the Server Route case, is depicted in Fig. 13. The process 
obtains the Command Descriptor ID from the Server Route. It then correlates it with 
the information it has on the Command Descriptors available in tie scene. If no 
match is found, this is an error and no further action is taken If a match is ound 
then the system examines the command ID in order to see tf it corresponds to known 
semantics (pre-defined command IDs). If it corresponds to known semantics men *e 
systemmay process the command parameters according to the desired semantic, fit 
does not correspond to known semantics then the system skips this state, and directly 
packages the indicated Command and transmits it to the server. 

In the case of Command Route nodes, referring to Fig. 14, upon the 
generation of a user or system event, the receiver propagates the event through the 
network of ROUTES. If an event reached the -execute' field of a Command Route 
node, the system check if that event corresponds to a condition associated with the 
logical True value. If no, server command processing terminates; if yes, the dispatch 

process is executed. . , 

This process, for the Command Route node case, is depicted in Fig. 15. The 
sequence of steps is essentially identical to that of Fig. 13, the difference bemg that 
the reference for the Command Descriptor ID is now in the Command Route node 
rather than a Server Route. 
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mouse click. 

12. The system in accordance with claim 8, wherein the triggering event is a 
timer signal. 

13 . The system in accordance with claim 8, wherein command information is 
transmitted from the server to the client. 

14. The system in accordance with claim 8, wherein command information is 
transmitted from the client to the server. 
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CommandRoute 

Node interface 
CommandRoute { 

eventln SFBoo) Execute FALSE 

field SFUrl CommandDescriptor Q 

} 

Functionality and semantics 

The CommandRoute node is used to support server interaction. A command route is executed when an 
event is received on the execute field,7or example, from a touch sensor. The execution of a command 
route involves communicating the command pointed by the commandDescriptor to the server. The 
commandDescriptor field contains either a URL to the command descriptor or the ID of the Command 
Descriptor to be associated with this CommandRoute node. Commands are typically sent to a server 
using DMIF's DAI JJser_Command primitives. The node update mechanism can be used to change the 
command descriptor ID. This allows supporting different interaction behavior for the same user 
interaction at different times (before and after node update). 
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class CommandDe script or: bit (8) CommandDescriptorTag = 0x05 { 

bit (16) CommandDescriptorlD; 
bit (16) CommandID; 

bit (16) length; 

// stream count; number of ES_IDs associated with this message 
unsigned int (8) count; 

// ES_Id (channel numbers) of the streams affected by the command 
unsigned int (16) ES_ID[count] ; 

// application-defined parameters 
do { 

unsigned int (8) paramLength; 

char (8) commandParam [paramLength]; 

) 

while (paramLength !«0) ; 

} 

FIG. 6 



class CommandDe script or Remove : bit (8) CommandDescriptorRemoveTag « 0x06 { 
bit (16) CommandDescriptorlD; 

} 

. - FIG. 7 

class BIFSScene { 

SFNode nodes (SFTopNode) ; 
bitU) hasROUTEs; 
if (hasROUTEs) 

ROUTES routes ; 
bit(l) has ServerROUTEs ; 

// the following is added to the MPEG-4 syntax 
if (hasServerROUTEs) 

ServerROUTEs sroutes; 



// modification of MPEG-4 ROUTES structure to point to command descriptor 
class ServerROUTEs { 

bit (1) isUpdateabie; 
if (isUpdateabie) 

bit (10) sroutelD; 

bit (10) outNodelD; 

NodeData nodeOUT = GetNodeFromlD (outWodelD) ; // get source node 
int (nodeOUT.nOUTbits) outFieldRef; // event source field index 

bit (10) CD_ID; // event sin): - command descriptor ID 
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class Coitutimand I 
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// command ID 

bit (16) CommandID; 

unsigned int (8) count; 

// ES Id of the streams 
unsigned int (16) ES_ID[count] ; 

// application-defined parameters 

d ° { unsigned int (8) p^ramLength; 

char (8) commandParam-IparamLength), 

while (paramLength!=0) ; 

FIG . 11 
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